Brill’s Pos Tagger with Extended Lexical Templates for Hungarian

نویسنده

  • Beáta Megyesi
چکیده

In this paper Brill’s rule-based PoS tagger is tested and adapted to Hungarian. It is shown that the present system does not obtain as high accuracy for Hungarian as it does for English because of the structural difference between these languages. Hungarian has rich morphology, is agglutinative with inflectional characteristics and has free word order. The tagger has the greatest difficulties with parts-of-speech belonging to open classes because of their complicated morphological structure. The accuracy of tagging can be increased from 83% to 97% by changing the rule generating mechanisms, namely the lexical templates in the lexical training module.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Brill's Pos Tagger for an Agglutinative Language

In this paper Brill's rule-based PoS tagger is tested and adapted for Hungarian. It is shown that the present system does not obtain as high accuracy for Hungarian as it does for English (and other Germanic languages) because of the structural difference between these languages. Hungarian, unlike English, has rich morphology, is agglutinative with some inflectional characteristics and has fairl...

متن کامل

Brill’s rule-based PoS tagger

Eric Brill introduced a PoS tagger in 1992 that was based on rules, or transformations as he calls them, where the grammar is induced directly from the training corpus without human intervention or expert knowledge. The only additional component necessary is a small, manually and correctly annotated corpus the training corpus which serves as input to the tagger. The system is then able to deriv...

متن کامل

Training and Evaluation of POS Taggers on the French MULTITAG Corpus

The explicit introduction of morphosyntactic information into statistical machine translation approaches is receiving an important focus of attention. The current freely available Part of Speech (POS) taggers for the French language are based on a limited tagset which does not account for some flectional particularities. Moreover, there is a lack of a unified framework of training and evaluatio...

متن کامل

Comparison of different POS Tagging Techniques (N-Gram, HMM and Brill’s tagger) for Bangla

There are different approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. In this paper we compare the performance of a few POS tagging techniques for Bangla language, e.g. statistical approach (n-gram, HMM) and transformation based approach (Brill’s tagger). A supervised POS tagging approach requires a large amoun...

متن کامل

High-Performance Tagging on Medical Texts

We ran both Brill’s rule-based tagger and TNT, a statistical tagger, with a default German newspaper-language model on a medical text corpus. Supplied with limited lexicon resources, TNT outperforms the Brill tagger with state-of-the-art performance figures (close to 97% accuracy). We then trained TNT on a large annotated medical text corpus, with a slightly extended tagset that captures certai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999